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Physical contact remains difficult to trace in large metropolitan networks, though it is a key vehicle for the 
transmission of contagious outbreaks. Co-presence encounters during daily transit use provide us with a 
city-scale time-resolved physical contact network, consisting of 1 billion contacts among 3 million transit 
users. Here, we study the advantage that knowledge of such co-presence structures may provide for early 
detection of contagious outbreaks. We first examine the "friend sensor" scheme - a simple, but universal 
strategy requiring only local information - and demonstrate that it provides significant early detection of 
simulated outbreaks. Taking advantage of the full network structure, we then identify advanced "global 
sensor sets", obtaining substantial early warning times savings over the friends sensor scheme. Individuals 
with highest number of encounters are the most efficient sensors, with performance comparable to 
individuals with the highest travel frequency, exploratory behavior and structural centrality. An efficiency 
balance emerges when testing the dependency on sensor size and evaluating sensor reliability; we find that 
substantial and reliable lead-time could be attained by monitoring only 0.01% of the population with the 
highest degree. 



Digital traces generated by citizens, during their commute across metropolitan transportation networks are 
helping answer long-standing questions on topics from individual mobility to collective interaction 
patterns. A series of landmark papers examining multiple large-scale digital traces has shifted the under- 
standing of individual mobility patterns from random to highly structured and predictable 1 " 5 . This has important 
implications in urban dynamics and epidemiology, particularly as the reproducible structure of metropolitan 
face-to-face encounters does significantly shape the spatial-temporal dynamics of disease spreading 68 . Therefore, 
advances in deciphering metropolitan encounter patterns play an important role in detection and mitigation of 
contagious outbreaks 9 " 11 . 

In detecting and containing contagious outbreaks, it is crucial to identify "super-spreaders", as they may 
provide significant lead indicators for the early response of public health agencies 1213 . To measure an individual's 
importance in spreading processes, various centrality measures, such as degree, betweenness, closeness 14 , /c-shell 
index 13 and activity potential 15 have been applied to theoretical diffusion models. Recent empirical works have 
confirmed the importance of these diverse measurements in real-world diffusion processes 1315 " 20 . To obtain such 
measurements, full knowledge about the contact network structure is usually required; however, other than 
simulating human interaction at this level of resolution 610,21 , mapping such structure from real-world physical 
contact processes could be expensive to collect, computationally costly, laborious in the filtering of spurious 
connectivity, and privacy- sensitive 7 ' 22 " 24 . This has been particularly true for large metropolitan contact networks, 
where the availability of citywide datasets is still limited 25 ' 26 . 

Disease monitoring is extremely costly, privacy sensitive, and involves enormous technical difficulties. A low- 
cost contact network structure constructed from transit use may provide a way to design efficient monitoring 
strategies using a small fraction of the population. In this work, we examine the largest metropolitan encounter 
dataset collected to date - travel smart card data from all of Singapore's bus users, covering approximately 3 
million users during 1 week. Using one week's tapping-in/tapping- out data collected from public transit services 
in Singapore, we built a large-scale high-resolution physical contact network. In a recent study based on this 
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Figure 1 | Modeling contagious outbreaks in a city-scale physical contact network. (A) Simulated infection processes from one infectious individual (red 
square). The encounter network is drawn in two layers: effective infection path (solid links in full color) and the remainder physical encounters (thin links 
with opacity). (B) Temporal (hourly) change of susceptible and exposed people and infected people across the population. The results come from one 
simulation with contagion rate ft = 0.0015, demonstrating how transit users become infected from day to day. (C) Temporal ratio of infected and 
susceptible from 20 simulations with different contagion rate The solid curves show average ratios {Ip/N P } over 20 runs and error bars indicate standard 
deviation. The dashed curves show the average trend of infected ratio (Is/N s ) of the 1% friend sensors. Lead-time can be estimated by checking time 
difference when {I P /N P } reaches certain value. (D) Number of hourly infection incidences during the week, from the same simulation run as in panel (B). 
The orange dashed curve and the blue solid curve illustrate the temporal variation in population C and the selected friend group S, respectively. 



dataset, we demonstrated that physical encounters display a signifi- 
cant degree of temporal regularity and these rhythmic interactions 
form a large-scale spatial-temporal contact network, spanning all of 
Singapore for the whole week 5 . The study emphasizes that encoun- 
ters at this fine-grained scale are also very structured, and far from 
random. If the former study identified the global behavioral prop- 
erties that generate this citywide co -presence network, our present 
study tries to identify the key individuals' network properties that can 
be exploited to combat the spread of infectious disease. 

As an alternative to constructing a global structure of contact 
networks, recent research exhibits an increasing interest in applying 
crowd-sourcing as a potential strategy to detect contagious out- 
breaks, from using declared "friends as sensors", to aggregated search 
engine queries, to social media 27 " 32 . Although these methods pro- 
posed are based on simple principles and require only small slices 
of information, they also show great advantages in providing early 
warning. Still, interesting questions remain in comparing the 
possible gains of using full knowledge vs. local methods in an epide- 
miological city-wide scenario. We perform such study in this high- 
resolution network, as a first evidence of its kind at a population and 
metropolitan level. 

Results 

To explore the dynamics of city-scale contagious outbreaks, we 
applied a general Susceptible-Exposed-Infected (SEI) model 33 to 
simulate the spreading processes (see Methods and Supplementary 
Note 1). Briefly, a simulation run is initialized with ten infectious 
people (as index cases), who are selected randomly among all transit 
users on Saturday. In the temporal weighted physical encounter 
network (with each contact as an edge and its duration as weight), 
an infectious individual i will transmit disease to neighbor j with 
probability py = fidy per 20 seconds (contagion rate /? is a universal 
parameter across the population and dy represents encounter dura- 



tion; see Fig. 1A for example). Once a susceptible individual get 
exposed, he/she becomes infectious after 2 hours, starting to spread 
the disease to other susceptible people. As almost all transit journeys 
are shorter than 2 hours, the introduction of this exposure stage 
prevents one from getting infected and then infects others directly 
during the same journey (which will significantly boost the spreading 
as instantaneous networks for a vehicle is always fully connected). 
Note that p = 0.003 is used in a high- resolution contact network in 
Ref 7 ' 20 ; we apply comparable values in our simulations. The full tem- 
poral resolution enables us to simulate the spreading processes dur- 
ing the whole week based on the proposed scheme for detecting 
contagious outbreaks, by registering infection time and transmission 
pathway on individual levels (Fig. IB). 

As mentioned above, a simple, but effective strategy for early 
detecting contagious outbreaks without mapping the detailed struc- 
ture of a social network is to find friend sensors from the popu- 
lation 22 . The inherent principle behind this method: a randomly 
selected "friend" (neighbor; in a friend group) of one vertex (in a 
control group) has higher degree on average when the network has a 
heterogeneous degree distribution, implying that friend group is 
more central than the control group (or the population as a whole). 
This is commonly referred as the "friendship paradox"; your friends 
have more friends than you do 34 . However, as social links initiated by 
physical encounters with strangers display a significant degree of 
heterogeneity, it remains unclear whether the friend sensor scheme 
- obtained from a static network structure - works in temporal 
spreading processes. Hence, to assess performance of the friend sen- 
sor scheme, we conducted multiple simulation experiments with 
different contagion rates fi. In each simulation, we first select 1% 
individuals from population P randomly as a control set C := {c^q 
G P}; the corresponding sensor group S is composed of randomly 
selected neighbors of each individual in C (S := (s/|s f e N (c z ), c { £ 
C), and N (c,-) is a neighbor set of individual q). Note that S is a list 
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instead of a set since an individual might be selected repeatedly from 
different N (c z ). After obtaining results from 20 simulations, we mea- 
sured the average infected ratio (Is/N s ) of the sensor groups and {I P I 
N P ) of the whole population temporally, finding that friend sensors 
have large lead-times (Fig. 1C). Given the heterogeneous individual 
participation and size of the time window 35 , spreading exhibits a 
linear increase - instead of a saturation process - after the explosive 
stage. 

In Fig. ID, we show the temporal change of infection incidence i 
(t) from the same simulation as Fig. IB. The sensor group S is 
obtained by the same selection scheme; however, in this case, the 
control group C is the whole population. Together with Fig. 1C, we 
found that spreading in S not only happens earlier, but also faster 
than in the whole population, suggesting that the lead-time also 
varies with time (or infected ratio; see Supplementary Note 3 and 
Supplementary Fig. SI and S2). Notably, although the temporal 
structure is not used in finding sensors, the friend sensor scheme is 
still efficient in early detecting outbreaks in our simulation 
experiments. 

Considering that friend sensors are identified locally without using 
any centrality measures, they could be representative of a universally 
applicable strategy when it is costly or impossible to map the detailed 
network structure. To investigate the superiority of friend sensors in 
a comparable manner, we employed different centrality measures to 
quantify an individual's importance based on the both network 
structure and individual travel behavior employing the following 
centrality measures (see Supplementary Note 2): (1) Degree k, mea- 
suring total number of contacts of each individual during the week, 
(2) Travel frequency/, frequency of taking public transit services, (f 
could also be interpreted as number of activities in temporal net- 
works 15 ) (3) Shell index k s > taken from /c-shell decomposition 13 on the 
static network and (4) Encounter entropy S, capturing temporal 
diversity of encounters: 

S=-£>lnp„ (1) 

t 

where p t is the probability of an individual's physical encounter in 
time t (hourly). Using time-stamped encounter transactions, we can 
build the whole contact network and determine individual's central- 
ity for both control and sensor sets (see Fig. 2). 

Indeed, a sensor group is more central than the randomly selected 
control group in terms of degree k (Fig. 2A); however, it is not yet 
known whether the friend paradox applies to other measures related 
to travel behavior (other than network structure). Before looking for 
additional sensors, we first measured other centrality distributions P 
(/), P (k s ) and P (S) using both population and friend sensors. 
Although most people traveled less than 5 times during the week, 
we still found that P (f) was characterized by a heavy tail across the 
population, indicating a significant degree of heterogeneity in indi- 
vidual transit use pattern (Fig. 2B). Moreover, we found that P (J) of 
the sensor group clearly exhibited the friend paradox as well, indi- 
cating that the people you have encountered on buses traveled more 
often than you do. Using the same control and sensor groups, we 
then obtained the distributions P (k s ) and P (S). As Fig. 2C and D 
demonstrate, the friend paradox does exhibit in terms of shell index 
k s and encounter entropy S as well, suggesting that friend sensors 
have higher /c-shell indexes and show higher temporal encounter 
diversity than the population. Taken together, Fig. 2 suggests that 
the simple friend sensor scheme can universally identify more cent- 
rally located social sensors. Nevertheless, as the percentiles show (in 
all Fig. 2 panels), there are still significant differences between the 
most central individuals and friend sensors, further indicating that 
the efficiency of friend sensors might be limited. Taken together, as 
one might expect, the simple principle of friend sensor scheme also 
prevents itself from performing more efficiently, as better sensors 
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Figure 2 | The "friendship paradox" exhibited in temporal encounter 
network. (A) Degree distributions P (k) of population and their neighbors 
(friends). The average degrees are (k) control = 238.5 and {k) sensor = 442.0, 
respectively. (B) Probability density function P (J) of stage frequency of 
population and neighbor set. The inset shows the same plot in semi-log 
scale. The mean values are (J) control = 8.0 and (j) sen sor = 13.0. (C) 
Probability density functions P (k s ) of shell index k s . The mean values are 
(k s ) Control = 120.5 and {k s ) sensor = 167.3. (D) Distribution of encounter 
entropy S. The density function P (S) has centralized peaks around In 1 = 
0, In 2 = 0.693, In 3 = 1.099 and In 4 = 1.386, resulting from individuals 
with homogenous encounters in corresponding number of intervals. The 
mean values of encounter entropy are {S) contro i = 1.35nat and {S) sensor = 
2.00nat. The purple dashed lines in all these panels (from left to right) 
indicate the 90 th , 99 th , 99.9 th and 99.99* percentiles of corresponding values 
across the whole population, explaining the degree of heterogeneity among 
most centrally located individuals, friend sensors and the population as a 
whole. 

could always be obtained by using more information on contact 
structure. 

We next compare performance of the best sensors identified by 
each centrality measure against friend sensors by quantifying lead- 
time on a universal scale. When individual infected time cannot be 
obtained across the whole population, lead-time is estimated as dif- 
ference between control and sensor samples in general 22 . However, 
since transit services are generally not operated 24 hours a day, the 
cumulative infection curve is not strictly monotonic increasing dur- 
ing the monitoring period in our case, resulting in significant differ- 
ence when calculating lead-time from multiple runs; thus, using 
instantaneous lead-time is a biased measure of sensor performance 
(see Supplementary Note 3 and Supplementary Fig. S2). However, 
given that individual infection time can be traced from simulations, 
we can essentially quantify lead-time against the whole population 
instead of a small sample control group. For efficient early detection, 
we fixed the monitored infected ratio a = [ai,a 2 ) = [0.05,0.25) and 
measured only the difference of infection time of people in a, obtain- 
ing infection time tp = {ti\oti <F P (ti) <ot 2 } from population and 
tg = {ti\oc\ <Fs(ti) <ot 2 } from sensor group (F represents the empir- 
ical distribution of exposed time). We re-define lead-time as the 
difference of average tp and t$\ 

T=(t)l-(t)l (2) 



SCIENTIFIC REPORTS | 4:5099 | DOI: 1 0.1 038/srep05099 



3 



5 20 





Sensor(%) 



Sensor(%) 




-10 



-30 



^0.001 
^0.0015 



0.002 
-0.005: 



10"° 1<T 



10"' 10 u 

n(%) 



10' 10' 



c 


60 




40 




20 






£ 


0 




-20 




-40 







f □ 0.001 v 0.002 
oQ.0015 ❖ 0.005 



0 20 40 60 80 100 

Sensor(%) 




20 40 60 80 100 

Sensor (%) 



Figure 3 | (A) -(D) Mean and standard deviation of lead-time for sorted 
slices (1%) obtained by (A) degree k, (B) frequency f, (C) k- shell index k s 
and (D) encounter entropy 5. In panel (A), the dashed line and error bars 
show lead-time provided by 1% friend sensors as a guide. As no centrality 
measure is used in identifying friend sensors, lead-time will not change by 
choosing alternative control groups. All curves demonstrate a monotone 
increase approximately - except sensors identified by k s ; the top 1% even 
fall behind friend sensors when B = 0.005. 



Next, we ordered individuals according to their centrality measure 
and divided the whole population into 100 percentiles. Using each 
percentile as a sensor group, we performed 20 simulation runs and 
measured the corresponding lead-times under different contagion 
rate As Fig. 3 shows, the top 1% slices from all these partitions are 
able to provide early detection; however, the less the average cent- 
rality is, the shorter the lead-time T will be. For example, the sensor 
group provides no advanced detection when k ~ k 0A and even falls 
behind the general population when k > k 0A (k 0A is the 40 th percent- 
ile of degree). In this case, lead-time may reach infinity if the spread- 
ing cannot reach a 2 (25%) among sensor group. By comparing these 
centrality measures jointly, we found that they actually vary consis- 
tently on sensor composition; however, no one outperforms the 
others significantly (see Supplementary Fig. S3). 

The efficiency of using such sensors to detect contagious outbreaks 
depends not only on centrality measures, but also sensor size \S\. On 
one hand, a small sample size induces large variation, providing poor 
reliability in potential applications. On the other hand, the difference 
of average centrality measure might be more and more significant 
given the intrinsic heterogeneity of individual behavior, revealing 
that we may achieve longer lead-time with lower cost (if the cost is 
in proportion to sensor size). In Fig. 4 A, we chose degree as primary 
centrality and measured lead-time for logarithmically spaced sam- 
pling rate n = \S\/\P\, spanning from 0.001% (only 27 people with 
highest degree) to 100% (the full population is used as sensors; lead- 
time is zero in this case). As the figure shows, smaller sample size 
indeed provides longer lead-time, but, with larger variation. In 
Fig. 4B, we show performance of friend sensors obtained from 
equally sized control groups. Given that the sensor group is always 
sampled from a deterministic population, we observed a constant 
average lead-time, independent of sampling rate n. However, the 
standard deviation of lead-time decreases as sample size gets larger 
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Figure 4 | Effect of sensor size on efficiency and reliability in detecting 
contagious outbreaks. (A) Lead-time provided by sensors with highest 
degree, with sampling rate n= I SI / 1 PI in a logarithmically spaced interval 
spanning from 0.001% to 100% with different contagion rate ft = {0.001, 
0.0015, 0.002, 0.005}. The error bars correspond to standard deviation of T. 
(B) Lead-time provided by friend sensors identified by random control 
group C of different size. Given that sensors are characterized by the same 
distribution, lead-time exhibits a convergence pattern with the increase of 
sample size. In fact, with sampling rate n increases, the variance of T 
determined from one particular simulation run reduces, resulting in the 
decreasing overall variance. (C) Distribution P(T) of lead-time T given 
different sensor size I SI when contagion rate ft = 0.001, corresponding to 
panel (A). (D) The same plot as panel C, however, for friend sensors 
corresponding to panel (B). 

in both Fig. 4 A and B, corresponding to the law of large numbers 
when calculating lead-time in each simulation. 

In practice, one should not just consider average lead-time and 
monitoring cost of such sensors; their reliability is equally important. 
To evaluate sensor reliability, we created a simulation result set with 
500 runs and measured the lead-time distribution P (T) for contagion 
rate ft = 0.001. As Fig. 4C shows, average lead-time of different 
sensor groups (in terms of sensor sizes) is well characterized by 
normal distribution, however, with significant mean and variance 
difference. Notably, the top 0.01% group performs extremely well 
for both average lead-time provided and reliability. Fig. 4D shows 
results of the same analysis for the friend sensor scheme. We 
observed that the larger the sensor size is, the more reliable the 
lead-time becomes; however, increasing sensor size does not raise 
average performance, consistent with what Fig. 4B shows. We also 
applied this procedure to other centrality measures: frequency/, k- 
shell index k s and encounter entropy S, finding that sensor group 
identified by degree outperforms all other centrality measures (see 
Supplementary Note 4 and Supplementary Fig. S4). Taken together, 
Fig. 4 suggests that the friend sensor scheme indeed provides a sub- 
stantial lead-time in early detection; however, the inherent principle 
prevents it from performing better by adjusting sensor sizes (in other 
words, average performance is independent on sensor size), whereas 
a well-defined sensor (obtained by degree centrality in this case) can 
easily outperform it. Our results further illustrate a clear advantage of 
deriving sensors from the full co -presence network, providing 
longer, more reliable lead-time by using a smaller sensor group. 
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Discussion 

To summarize, we show the feasibility of a friend sensor scheme in 
providing early detection during a contagious outbreak in a met- 
ropolitan physical contact network. Indeed, the simple friend sensor 
scheme, which does not require a detailed network structure, works 
consistently well in finding sensors that are more central in the 
network. However, since all friend sensors are actually characterized 
by a deterministic neighbor population based on network structure, 
their performance is often limited by inherent characteristics of the 
neighbor population, providing constant early warning on average, 
independent of sample selection and sample size. Therefore, it is still 
crucial to show the value of full network structure, in particular for 
early detecting contagious outbreaks. Taking advantage of indi- 
vidual-based passive data collection techniques on city-scale (transit 
fare collection systems in this paper), we mapped detailed spatial and 
temporal structures as a whole and identified new sensors given 
diverse centrality measures, offering new insight into finding more 
efficient social sensors. Considering the weak, passive and indirect 
nature of social links enabled by these common daily physical 
encounters, /c-shell index k s - a well-defined structural centrality - 
is less effective than the simple degree k and frequency/ (number of 
activities) in contagious detection. Note that we did not use between- 
ness and closeness centralities as a measure in our study. On one 
hand, computing shortest path-based centralities is extremely time- 
consuming because of this network's high density. On the other 
hand, considering the temporal nature of daily encounters, the role 
of static shortest path is not as significant as it is in social networks of 
personal relations. Based on the spreading settings examined in our 
study, a well-defined social sensor group based on degree may 
account for only 0.01% of the total population; however, it provides 
longer and more reliable lead-time - than the friend sensor scheme - 
allowing public health officials and governments to plan a quick and 
efficient response. In practice, those sensor individuals can be easily 
identified by transit agencies given their unique smart card IDs, and 
the remaining question is to monitor the health status of sensor 
groups. Although sensor groups are deterministic given our observa- 
tions, it may still not easy for public health agencies to monitor their 
status owing to privacy and technical issues, which are beyond the 
goal of our study. Nevertheless, one possibility is to use the emerging 
ICT in health monitoring - such as health- care applications in smart- 
phones - and ask these special people to willingly provide anonymous 
information to public health agencies. Another possibility, assuming 
that health authorities could have access to the identities of these 
sensor group individuals, is to, instead of monitoring them directly, 
track their appearance as cases of selected contagious diseases 
reported by hospitals and local clinics. Were this sensor groups to 
appear with a certain statistical trend in clinical reports, we would 
have in our hands an early-warning signal for the future advance- 
ment of the contagious disease. 

Having said that, influenza-like diseases are transmitted primarily 
by close contacts. Although the network used in our study is created 
across the whole metropolitan area, capturing all transit users' con- 
tacts during a whole week, it still covers only a small slice of all 
potential contacts in our daily life, forming only a subnetwork of a 
network of contacts that would be important in the spread of actual 
epidemics. On the other hand, to simulate an outbreak, we fixed 
relatively unrealistic simulation settings, such as introducing only 
2 -hour exposed period and using an simplified SEI model instead 
of a full developed SIR or SEIR model, for the outbreak to travel at a 
speed where global and local methodologies could be tested. To what 
extent the simulation can match a real contagious outbreak and the 
relevance of the simulations findings to actual epidemics remain to 
be measured. Thus, it is important to note that the specific results in 
our study are embedded in the physical encounter network with a 
pre-defined spreading mechanism. Such encounters on transit 
vehicles occur more often between perfect strangers than among 



friends, colleagues or families, making the network incomplete for 
predicting epidemic spreading via all possible transmission path- 
ways. Therefore, great caution should be exercised in interpreting 
the results. In reality, a full contact network for disease spreading 
consists of all of social links from diverse circumstances; it remains 
unclear to us which part should be given priority with respect to the 
characteristics of an unknown virus/disease. Nevertheless, with the 
rapid development of information and communication technologies, 
mapping the whole structure of close encounters from various data 
would be far less difficult and laborious today. Given the high 
individual and collective regularities rooted in human behaviors 2 " 5 , 
patterns of face-to-face encounters in various settings could be 
documented as well 7 ' 24 , helping us build more comprehensive 
agent-based models to contain emerging epidemics 10,36 . Moreover, 
with our increasing knowledge about ourselves and various micro- 
organisms around us, more efficient social sensors for different scen- 
arios can be identified and applied in monitoring contagious 
spreading from day to day, providing early and accurate information 
to support better decision making. We believe that our work can 
serve as a base to help better combat the spread of disease on a 
citywide scale 37 ' 38 and better understand social contagion 
dynamics 39 " 41 . 

Methods 

Data sets. Trip records were collected from Singapore's smart-card-based fare 
collection system, covering more than 96% of public transit trips. The system collects 
data for both bus and MRT (subway) modes. Smart card data is widely used in public 
transit: network planning, service adjustments, providing ridership statistics, and 
indicating service performance. We employ bus, not MRT (Mass Rapid Transit, 
railway based) trip records in this study, since it is difficult to identify close proximity 
interactions on large MRT trains. For buses, once a smart card holder boards a vehicle 
(tapping- in), the system generates a temporary transaction record; after he/she leaves 
the vehicle (tapping-out), a complete record will be stored with detailed trip 
information. 

A full bus trip may contain more than one stages with transfers from one route/ 
vehicle to another. The stage records are generated separately in the smart card system 
(with each tapping- in and tapping-off). Since our goal is to identify in-vehicle 
encounters and the people one may encounter in vehicles will differ from stage to 
stage, we use the term trip to represent stage in this document. After processing the 
raw data, we obtained the trip records used in this study. The fields and their contents 
are provided in Supplementary Tab. SI. 

This study was performed on the trip records of one week in March, 2012. The 
dataset contains 22,455,159 bus trip transaction records from 2,969,320 individual 
smart card holders. 

Simulation. To evaluate the performance of social sensors in the obtained interaction 
network, we use the SEI models to simulate contagious outbreaks among all transit 
users 33 , which are assumed to be in one of two states: susceptible (S) when they are 
prone to infection, exposed (E) between exposure and infectiousness, or infected (I) 
when they can transmit the disease to others. In studying the outbreak dynamics, we 
are more interested in the initial spreading processes and thus we do not consider the 
recovery stage in the simulation. 

All simulations start on Saturday and end on the next Friday, spanning the whole 
week (given the dataset). In the spreading process, the duration of explosive stage 
(such that 0 < I/N < 1%) is highly determined by the number of index cases. Thus, a 
smaller index size induces larger in terms of temporal spreading processes; however, 
after this explosion, the spreading becomes steady and contagion rate /? determines 
the spreading speed of the rest spreading. Thus, to boost the initial spreading pro- 
cesses, we set ten index cases in our simulation, enabling us to observe outbreaks in 
one week. On the other hand, since people show great heterogeneity in their transit 
use behavior (such that/< 5 for almost 50% of the users during the week), a larger 
number of index cases also prevents the disease from dying out at initial stage. 
However, as sensor performance is monitored given infected ratio 5% < Fp(tj) < 25% 
(after the explosive stage and during the steady spreading), lead-times and their 
variability are mainly determined by /? rather than number of index cases (see 
Supplementary Fig. S5). Thus, all our simulations start with ten infected people (ten 
infected cases), randomly selected across all transit users who were active on Saturday 
(who took buses on Saturday). After being infectious, individual i will transmit disease 
to a susceptible individual j, who individual i encountered during his/her journey, 
with probability py = /? X dy (dy is encounter duration). Here, /? is an important 
parameter determining the speed of contagious spreading. We chose a series of values 
from 0.001 to 0.005 per 20 seconds. On one hand, these values are similar to the value 
used in Ref. 7. On the other hand, by simulating the spreading processes with different 
P, we can better evaluate the performance of different sensors for outbreaks with 
different Given any instantaneous network in a vehicle is a fully connected one, 
disease may spread very fast once one individual get infect. To avoid this, we intro- 
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duce the exposure (E) stage - which lasts for 2 hours - in our simulation: once an 
individual is exposed, he/she will not spread the disease immediately; however, he/she 
begins to begin to spread the disease to other encounter people after 2 hours. 
Considering that most transit trips take place in under 2 hours, one is unlikely to get 
infected and begin spreading disease to others during the same trip. 

In the simulation, each time step represents 0.5 h, such as 7:00-7:30. In any step t, 
we first identify all the neighbors he/she has encountered (the time they encountered 
each other should be within this time step) and then get them exposed with the 
defined probability The incubation time is selected as a constant given the time 
granularity (2 h), and thus, the exposed individuals become infectious in step t + 4. 
We also tested our results when setting the exposed period to be 6 h and 12 h, finding 
that sensors identified by degrees performs consistently better than others 
(Supplementary Fig. S6). 

Based on these simulation settings, one can monitor the temporal spreading 
dynamics from a set of simulations with certain /? and random seeds as initial infected 
people. Meanwhile, individual infection time could be traced from each simulation. 
As Supplementary Fig. SI shows, although contagion rate in each panel is the same, 
simulations still differ significantly from each other, in particular when /? is low. Thus, 
estimating lead-time universally is important to establish the difference. 
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